Model Selection

Video-Language Pretraining

# Video-Language Pretraining

Languagebind Video Huge V1.5 FT

LanguageBind is a pretrained model that achieves multimodal semantic alignment through language, capable of binding various modalities such as video, audio, depth, and thermal imaging with language to enable cross-modal understanding and retrieval.

Multimodal Alignment

Languagebind Video V1.5 FT

LanguageBind is a language-centric multimodal pretraining method that uses language as the bond between different modalities to achieve multimodal semantic alignment.

Multimodal Alignment

Languagebind Audio FT

LanguageBind is a language-centric multimodal pretraining method that achieves semantic alignment by using language as the bridge between different modalities.

Multimodal Alignment

Languagebind Video FT

LanguageBind is a language-centric multimodal pretraining method that uses language as the bond between different modalities to achieve semantic alignment across video, infrared, depth, audio, and other modalities.

Multimodal Alignment

Languagebind Video Merge

LanguageBind is a multimodal model that extends video-language pretraining to N modalities through language-based semantic alignment, accepted by ICLR 2024.

Multimodal Alignment

Languagebind Image

LanguageBind is a language-centric multimodal pretraining method that uses language as the bond between different modalities to achieve semantic alignment.

Multimodal Alignment

Languagebind Depth

LanguageBind is a language-centric multimodal pretraining method that uses language as the bond between different modalities to achieve semantic alignment across video, infrared, depth, audio, and other modalities.

Multimodal Alignment

Languagebind Video

LanguageBind is a multimodal pretraining framework that extends video-language pretraining to N modalities through language semantic alignment, accepted by ICLR 2024.

Multimodal Alignment

Languagebind Thermal

LanguageBind is a pretraining framework that achieves multimodal semantic alignment through language as the bond, supporting joint learning of various modalities such as video, infrared, depth, and audio with language.

Multimodal Alignment

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase